Assessing Reliability on Annotations (1): Theoretical Considerations

نویسندگان

Jens Stegmann

Andy Lücking

چکیده

This is the first part of a two-report mini-series focussing on issues in the evaluation of annotations. In this theoretically-oriented report we lay out the relevant statistical background for reliability studies, evaluate some pertaining approaches and also sketch some arguments that may lend themselves to the development of an original statistic. A description of the project background, including the documentation of the annotation scheme at stake and the empirical data collected, as well as results from the practical application of the relevant statistics and the discussion of our respective results are contained in the second, more empirically-oriented report [Lücking and Stegmann, 2005]. The following points are dealt with in detail here: we summarize and contribute to an argument by Gwet [2001] which indicates that the popular pi and kappa statistics [Carletta, 1996] are generally not appropriate for assessing the degree of agreement between raters on categorical type-ii data. We propose the use of AC1 [Gwet, 2001] instead, since it has desirable mathematical properties that make it more appropriate for assessing the results of expert raters in general. As far as type-i data are concerned, we make use of conventional correlation statistics which, unlike their AC1 and kappa cousins, do not deliver results that are adjusted with respect to agreements due to chance. Furthermore, we discuss issues in the interpretation of the results of the different statistics. Finally, we take up some loose ends from the previous chapters and sketch some advanced ideas pertaining to inter-rater agreement statistics. Therein, some differences as well as common ground concerning Gwet’s perspective and our own stance will be highlighted. We conclude with some preliminary suggestions regarding the development of the original statistic omega that will be different in nature from those discussed before.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessing Reliability on Annotations (2): Statistical Results for the deikon Scheme

This is the second part of a two-report mini-series focussing on issues in the evaluation of annotations. In this empirically-oriented report we lay out the documentation of the annotation scheme used in the deikon project, discuss the results obtained in a respective reliability study and conclude with some suggestions regarding forthcoming versions of the scheme. Relevant statistical backgrou...

متن کامل

The Effects of Multimedia Annotations on Iranian EFL Learners’ L2 Vocabulary Learning

In our modern technological world, Computer-Assisted Language learning (CALL) is a new realm towards learning a language in general, and learning L2 vocabulary in particular. It is assumed that the use of multimedia annotations promotes language learners’ vocabulary acquisition. Therefore, this study set out to investigate the effects of different multimedia annotations (still picture annotatio...

متن کامل

Assessment of Infant Movement With a Compact Wireless Accelerometer System

There is emerging data that patterns of motor activity early in neonatal life can predict impairments in neuromotor development. However, current techniques to monitor infant movement mainly rely on observer scoring, a technique limited by skill, fatigue, and inter-rater reliability. Consequently, we tested the use of a lightweight, wireless, accelerometer system that measures movement and can ...

متن کامل

Content Analysis Table of Medical Ethics Book Based on Allport’s Theory of Value System

Introduction: Regular assessment of academic textbooks and revision of teaching methods are critical for making such textbooks more efficient in meeting the needs of the new generation and conveying values to them. Therefore, in line with the necessity of textbook evaluation, this research examined the extent to which the Medical Ethics book named “physicians and ethical considerations” observe...

متن کامل

Vox Populi Annotation: Measuring Intensity of Ideological Perspectives by Aggregating Group Judgments

Polarizing discussions about political and social issues are common in mass media. Annotations on the degree to which a sentence expresses an ideological perspective can be valuable for evaluating computer programs that can automatically identify strongly biased sentences, but such annotations remain scarce. We annotated the intensity of ideological perspectives expressed in 250 sentences by ag...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Assessing Reliability on Annotations (1): Theoretical Considerations

نویسندگان

چکیده

منابع مشابه

Assessing Reliability on Annotations (2): Statistical Results for the deikon Scheme

The Effects of Multimedia Annotations on Iranian EFL Learners’ L2 Vocabulary Learning

Assessment of Infant Movement With a Compact Wireless Accelerometer System

Content Analysis Table of Medical Ethics Book Based on Allport’s Theory of Value System

Vox Populi Annotation: Measuring Intensity of Ideological Perspectives by Aggregating Group Judgments

عنوان ژورنال:

اشتراک گذاری